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We propose a method for obtaining an unproved I apresec tnfton 
of transients in audio signals. The repr esentation is based on a 
damped si Tmpoidnl model To imp r ov e the representation, transient 
locations are modified in such a way that a transient can start only 
ax the beginning of a sinusoidal segment, The introduced modifi- 
cations facilitate a reduction of the number of damped sinusoids 
needed to model a transient well and the eltoination of pre -echo 
artifacts. With a listening test we verity mar the modifications do 
not result in apciceptnal dlffiagence between the original and mod- 
ified audio signals. 

1* INTRODUCTION 

Parametric coding of audio is a popular tool for representing audio 
signals at very low bit rates [U 2, 3, 4, 5J. In a parametric audio 
coder, a signal is represented by a model and parameters of the 
model are estimated and en coded, A popular parametric represen- 
lation of audio signals Is based on a decomposition of an origi- 
nal signal into three components: a transient component, a tonal 
(sinusoidal) component, and a noise component [1. 4> 5]). 
Having a dedicated model for Che transient component proved to 
be beneficial lor parts of audio signals with sharp attacks, because 
sinTTfiftirtal and noise models cannot represent those perc eptu ally 
important events efficiently [6], 

We propose a method for an improved representation of tran- 
sients. It was shown in [71 that transients can be modeled effi- 
ciently using sinusoids with esponentiallyHnodulatud amplitudes 
(damped sinusoids). An audio signal is analysed on a segment- 
by-segment basis, and each segment is rep resented as a sum of 
damped sinusoids- A problem occurs when a transient does not 
start at the beginning of a segment Compared to the case where 
a transient sraruj at the beginning of a segment, the number of 
damped sinusoids needed to model the transient well increases 
considerably. If a transient is not modeled properly, the modeling 
error is disuibuied over the whole segment, resulting in "iriifrlc 
pre-eeboea. Different methods have been used to solve this prob- 



AHow a onc-somplc -precision (fall-precision) variable seg- 
mentation of the signal, such that transients will always start 
at the beginnings of segments (e.g., [1]). 
Allow a switching between a long and a short window defin- 
ing analysis segments, such tftat short windows are used Cor 
pans of an audio signal with sharp attacks (eg., MPBG-1 



iLaycr m audio coding algorithm [8]). In this case, the seg- 
mentation is defined simply by the lengths of the long and 
the short windows. 

In this paper, we use a restricted time segmentation. By restricted 
segmentation we mean that the segment lengths are defined by in- 
teger multiples of a predefined minimum segment length, say S ras. 
Given soeh a restricted time segmentation, we xnodjfy the transient 
component of the audio signal such that a transient can start only 
at the beginning of a segment. Tins will result in an efficient rep- 
res en cation of transients with damped sinusoids. The advantages 
of this teethed as compared to the full-precision variable segmen- 
tation me the following: 

• The restricted segmentation significantly simplifies the anal- 
ysis procedure in an audio coder; 

» The restricted segmentation results in a reduction of the 
number of bits needed to describe die segmentation 

The remainder of this paper is organized as follows. The pro- 
cedure K) modify transient locations is described in S ection 2- Mod- 
eling wfyn damped sinusoids is described m Section 3. Results of 
computer simulations and listening tests are presented in Section 4* 
Finally, conclusions are summarized in Section 5. 

2. MODIFICATION OF TRANSIENT LOCATIONS 

In [9] w<j presented a method far modifying transient locations in 
an audio signal The transient component of the audio signal is 
estimate*! using a model based on duality between the time and 
the treqoency domain, as presented in [10]. This transient model 
is good tor very short transients, Le., with a sharp attack and a fast 
decay. Transient locations are modified by modifying parameters 
of a frartancy-domajD reprosecution of the transient cornponenx 
This -paper presents an improved method fur modifying tran- 
sient locations. In tins new method, an audio signal is modified in 
the following steps: 

1. The beginrringa and ends of transients are detected using an 
energy-based approach with two sliding rectangular win- 
dows, as presented in [1 1}. 

2. This samples between the beginning and the end of each 
transient are shifted (essentially, cut-and-paste) to the loca- 
tions specified by sinusoidal segmentation. 

3. The signal pans in between transients am time-warped m 
order to fill the intervals between the shifted transients. 

The advantages of the new transient modification method over the 
One presemed in [9] are the following: 



in n non n o n qq 




• The transient detection model of [11] provides good results 
also for transients with slow decoy, 

• The time* warping of the signal parts in between transients 
Is based on the knowledge of properties of sound percep- 
tion* such as pitch perception and temporal making effects. 

• The new modification method results m a lower corormci- 
aonal complexity. 

The transient detection approach of III] used in step lis based 
on the evaluation of the criterion function, C(n): 



(1> 



where Bi(ji) and Er{i\) arc the energies of the input signal a 
Within length-^ rectangular windows on the left- and right-hand 
side of a time sample n. Significant peaks of the criterion function 
C(n) correspond to the starts of transients. 

Step 2 of the new transient modification method is obvious. 
We now describe step 3 of the modification method. Doe to modi- 
fication of transient locations, the distance between two transients 
can become longer or shorter, In order to fill the interval between 
the shifted transients, the signal part in between has to be time* 
warped correspondingly. The tirne-warping of the signal is done 
in such a way that it preserves the correct amplirodcs of the edge 
points of the signal part in between the transients. Thns, no discoa- 
tannines are introduced just br.fnrc or after a> transient. The signal 
in between transients is stretched or compressed in time, lb com- 
pute the amplitudes at the new integer sampling instances based 
on the known amplitudes of the original samples an approxima- 
tion of the ideal bandlimiied interpolation based on sine functions 
is used. To compute the amphnxde of each new sample, ampli- 
tudes of eight original samples are used, four at each side of the 
new sample, A harming window is used to bruit the length of the 
sine functions. 

Ear tonal signals, a stretching or compressing the signal in 
time results in a corresponding change of fundamental frequency, 
/o- The goal of the motfificarion procedure is to ensure that me in- 
duced mo di fication of fy is not audible. Therefore, the fbHowing 
4 algorithm is proposed for time-warping a Signal pact in between 
two shifted transients (the stops are illustrated in Figure I lor the 
case where the length of the signal between two shifted transients 
is longer than the original: the opposite case is treated similarly): 

a) If the r equired change in length of a signal part in between 
two t r a fm' e rtf s results in the change of /b by no more than 
0.2 %, then simply use the time-warping method as de- 
scribed above (Figure la). Else go to step b. 
Motivation; from the Hterarnre on psychoaccrustics it is 
to own that changing fa of a tonal sound by 0-1 % can be 
audible [121. Our experiments verified this result. 

b) Split the signal part in between two rrawc^tc into two 
nonoverlapping intervals: the first interval is located di- 
rectly after tho end of the first transient and lasts 10 ms 
(interval 1 in Figure lb), and the second interval is the re* 
rnajnjng part, Le. it lasts until the beginning of t)ie second 
transient (interval 2 in figure lb). The lengths of the two 
intervals ore modified by 

change in length of me signal pan in between two transients 



A 
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Figure 1: Modification of transient locations. The new locations 
of transient beginnings are depicted with small arrows. The signal 
part in between two transients becomes longer. Steps a, b, e are 
explained in Section 2. 



cam be bono by changing fo in the interval 1 by no more 
than 2 and in the interval 2 by no more than 0.2 %. then 
time-warp che signal in the two intervals correspondingly. 
Else go to step c 

Motivation: the interval ^Erectly after the end of a transient 
is characterized by a strong masking effect from the tran- 
sient. Therefore, larger changes of the signal in this interval 
are possible before they become audible. Our experiments 
verified that a change of fo by no more than 2 % in the 
10 ims interval directly after die end of a transient is inaudi- 
ble,, 

e) Tune-warp the signal in the two intervals such 
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suiting change of /o is no more than 2 9b in die interval 
1 and no znoiB than 0l2 % in the interval 2. If The resulting 
change in length is not sufficient to fill the distance between 
the shifted transients then apply an overlap -add procedure 
with a hanfluig window using samples from the two inter- 
vals in order to increase or decrease ihe length of the signal 
To ensure a smooth transition between two intervals, the 
length of the overlap- add region is chosen to be larger than 
required to obtain a correct length of the signal in between 
two transients (Figaro lc). 

3. MODELING WITH LAMPED SINUSOIDS 

It was shown in [7] thai a transient can be modeled efficiently us- 
ing a damped sinusoidal modoL This model aims at approximating 
a signal a by a sum of, say M, sinusoids with exponentially mod- 
ulated amplitudes, i_e., 

AT 

meal 

where a^ a ,dr»,Cfc'V»,ym € St denote the amplitude, damping coef- 
ficient 1 , angular frequency and phase of the mtn sinusoidal com* 
ponem, respectively- N G N is the segment length. 

The stnnsoidal parameters am.^vm and can be se- 
lected with a number of methods, including spectral peak- 
picking, snbspece-hased analysis techniques and analysis-by- 
synthesis methods. Per the experiments described in this paper 
we used the matching pursuit algorithm [13], a particular analysis- 
by-synthesis method. Use matching pursuit algorithm is a greedy 
iterative algorithm which projects ax each iteration a signal onto 
the Sanction (in our case a damped sinusoid) that best mnTchre the 
signal and subtracts this projection to form a residual signal to be 
approximated in ihe next iteration* 

In order to find an "optimal" time segmentation we used the 
algorithm proposed in [14]. By optimal we mean optimal in a 
rats-distortion sense. This algorithm divides the input signal a into 
imn-overiapping segments and tries, by combining these segments, 
to find the parnnomng of s that minimizes the distortion given a 
target bit budget or a given number of sinusoidal components- Urv 
derthe assumption of additrvity of rate and (toortion over the con- 
s tituent segments, the global optimal segmentation is found by first 
minimizing the rate vs distortion for each segment independently, 
and then, using dynamic programming^ find the optimal segmenta- 
tion by combining these optimal encoded segments. By doing so, 
the algorithm gives the optimal time segmentation of a, as well as 
the number of sinusoidal components to qp /K^ 1 ^ to the individual 
segments. 

4. EXPERIMENTAL. RESULTS 

Below, we present results of computer simulations and listen- 
ing tests with audio signals. The signals are mono, sampled at 
44.1 kHz. The test excerpts Include castanets, bass, ABBA, Celine 
Dion, Me tallica, harpsichord, Suzanne Vega. Transient Locations 
are modified according to time grid of 220 samples (ca 5 ms). 

l The damping ooefinci cm can be any real number. Pesftree values 
Oftfnt, therefore, e onesp d nd to e xp a ndin g ampUtPtfea rather than cotraly 
damped ampHaides. 



Exijerpt 


Duration, s 


# detected 


Correct 






transients 


responses* % 


castanets 


7.1 


43 




bass 


10.8 


16 


S2S 


ABBA 


9.9 


29 


15.0 


Celine Dion 


12* 


26 


52S 


Me tallica 


10.1 


19 


52.5 


harpsichord 


11.7 


9 


40.0 


Suzanne Vega 


10.1 


13 


4LS 



Table 1: Results of the listening test on audibility of signal mod- 
ifications which include shifting transients and time-warping the 
signal parts in between t ran s ients . 



It iii important to verily that the introduced modifications do 
not resalt in any audible difference between the original and mod- 
ified audio signals, lb do that we performed a subjective listening 
test in which signal triplets AOB were presented to listeners- Here 
O is the original signal* A orB is the original signal and B or A is 
the modified signaL 'Hie task of a listener was to respond whether 
tfae modified signal was A or B. For each test excerpt, the triplets 
AOB were presented to a listener 5 times, each time the position 
of the modified signal (A or B) was changed randomly- Hght ex- 
perienced limners participated in the test. The results averaged 
over an listeners are presented in Table 1- They confirm that cbe 
introduced modifications are not audible. 

Next, we illustrate the Lmprovemani due to the modification 
procedure. We study the performance of a damped sinusoidal 
t nM*- 1 fiar an original signal transients start at arbitrary lo- 
cations) and for a modified signal (i.e., transients can start only 
at the beginnings of sinusoidal segments). The methods used to 
evaluate the performance are the same as in [9]. The performance 
is studied in terms of sigpal-to-noise ratio (SNR) versus the num- 
ber of damped sinusoids and is well illustrated in Figure 2, where 
it is presented for a particular transient of the castanets signaL ft 
is evident that more sinusoids are needed to model the transient 
with a attain quality in the case when the transient does not start 
at die beginning of a sinusoidal segment. The lower plots f Fig- 
ures 3 aid 4 abow the reconstruction with 25 damped sinusoids of 
the anginal end the modified transients, respectively. The original 
rpreiom does not start at the beginning of the segment and, as a 
result, the modeling error is distributed to samples before the tran- 
sient. Tlus results in a clearly audible pm-ecno. On the other hand, 
the modified transient starts at the beginning of the segment and, 
as o resuiit, the pre- echo problem is eHmirffitnd 

5. CONCLUSIONS 

In this p&per, we elaborated on the idea of modifying transient lo- 
cations in an audio signals for improved modeling and coding of 
audio. We presented a new method for modifying transient loca- 
tions. The introduced modifications an efficient represen- 
tation of transients with damped sinusoids and the elimination of 
pie-echo artifacts. We also verified that the modifications am not 
audible. 

It has to be noted, however, that a straightforward application 
of the modification procedure is not suitable for stereo signals. The 
reason for this is that an independent modification of transient lo- 
cations in two channels may destroy the original stereo image. We 
are cuxrentJy working on this issue. 
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£faoabcr of damped sinusoids 

Figure 2: Performance of a damped sinusoidal model in the case 
of a restricted segmentation far an original and a shifted t ransient 
The mim nmm segment length is 5 ms. 
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Figure 3; The original transient and its reconstruction with 25 
damped sinusoids. The mimnmm segment length is 5 ins. 




Figure 4: The shifted transient and its reconstruction with 25 
damped sinusoids. The minimum segment length is 5 ma. 
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